AITopics

2504.00602

Country:

Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
Europe > Germany (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Mielniczuk, Jan, Rejchel, Wojciech, Teisseyre, Paweł

Class prior estimation for positive-unlabeled learning when label shift occurs

arXiv.org Machine LearningFeb-28-2025

We study estimation of class prior for unlabeled target samples which is possibly different from that of source population. It is assumed that for the source data only samples from positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of class prior which avoids estimation of posterior probabilities and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as a non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal, together with a suitably modified version for large values of source prior, works on par or better than its competitors.

dataset, estimation, estimator, (13 more...)

arXiv.org Machine Learning

2502.21194

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Wormhole Memory: A Rubik's Cube for Cross-Dialogue Retrieval

arXiv.org Artificial IntelligenceFeb-14-2025

In view of the gap in the current large language model in sharing memory across dialogues, this research proposes a wormhole memory module (WMM) to realize memory as a Rubik's cube that can be arbitrarily retrieved between different dialogues. Through simulation experiments, the researcher built an experimental framework based on the Python environment and used setting memory barriers to simulate the current situation where memories between LLMs dialogues are difficult to share. The CoQA development data set was imported into the experiment, and the feasibility of its cross-dialogue memory retrieval function was verified for WMM's nonlinear indexing and dynamic retrieval, and a comparative analysis was conducted with the capabilities of Titans and MemGPT memory modules. Experimental results show that WMM demonstrated the ability to retrieve memory across dialogues and the stability of quantitative indicators in eight experiments. It contributes new technical approaches to the optimization of memory management of LLMs and provides experience for the practical application in the future.

large language model, machine learning, natural language, (20 more...)

2501.14846

Country:

Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Rubik's Cube (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Reducing Reasoning Costs -- The Path of Optimization for Chain of Thought via Sparse Attention Mechanism

arXiv.org Artificial IntelligenceDec-11-2024

In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain of thought length of this model and o1 Preview in solving the linear algebra test questions of MIT OpenCourseWare. The results show that GiantRabbit's reasoning time and chain of thought length are significantly lower than o1 Preview. It verifies the feasibility of sparse attention mechanism for optimizing chain of thought reasoning.

arxiv preprint arxiv, giantrabbit, language model, (12 more...)

2411.09111

Country:

Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.05)
Europe > Switzerland (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Gökdemir, Tuğçe, Rydzewski, Jakub

A Note on Spectral Map

arXiv.org Artificial IntelligenceDec-5-2024

In molecular dynamics (MD) simulations, transitions between states are often rare events due to energy barriers that exceed the thermal temperature. Because of their infrequent occurrence and the huge number of degrees of freedom in molecular systems, understanding the physical properties that drive rare events is immensely difficult. A common approach to this problem is to propose a collective variable (CV) that describes this process by a simplified representation. However, choosing CVs is not easy, as it often relies on physical intuition. Machine learning (ML) techniques provide a promising approach for effectively extracting optimal CVs from MD data. Here, we provide a note on a recent unsupervised ML method called spectral map, which constructs CVs by maximizing the timescale separation between slow and fast variables in the system.

chem, collective variable, phy, (15 more...)

2412.04011

Country:

North America > United States (0.05)
Asia > Japan (0.05)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

"Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks

arXiv.org Artificial IntelligenceDec-4-2024

As the application of large language models continues to expand in various fields, it poses higher challenges to the effectiveness of identifying harmful content generation and guardrail mechanisms. This research aims to evaluate the guardrail effectiveness of GPT-4o, Grok-2 Beta, Llama 3.1 (405B), Gemini 1.5, and Claude 3.5 Sonnet through black-box testing of seemingly ethical multi-step jailbreak prompts. It conducts ethical attacks by designing an identical multi-step prompts that simulates the scenario of "corporate middle managers competing for promotions." The data results show that the guardrails of the above-mentioned LLMs were bypassed and the content of verbal attacks was generated. Claude 3.5 Sonnet's resistance to multi-step jailbreak prompts is more obvious.

arxiv preprint arxiv, guardrail, language model, (14 more...)

2411.1673

Country:

Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Information Technology > Security & Privacy (0.94)
Transportation > Air (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

arXiv.org Artificial IntelligenceNov-20-2024

To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and used GPT4o as an experimental tool for verification. The experiment used 100 true and false questions, and compared the performance of the model trained with synthetic data intervention and the original untrained model on multiple indicators. The results show that the SDI training model supports the technology in terms of accuracy rate and sycophancy rate and has significant effectiveness in reducing sycophancy phenomena.

arxiv preprint arxiv, language model, sycophancy, (9 more...)

2411.10156

Country:

Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Chen, Yuxinxin, Dral, Pavlo O.

All-in-one foundational models learning across quantum chemical levels

arXiv.org Artificial IntelligenceSep-18-2024

Machine learning (ML) potentials typically target a single quantum chemical (QC) level while the ML models developed for multi-fidelity learning have not been shown to provide scalable solutions for foundational models. Here we introduce the all-in-one (AIO) ANI model architecture based on multimodal learning which can learn an arbitrary number of QC levels. Our all-in-one learning approach offers a more general and easier-to-use alternative to transfer learning. We use it to train the AIO-ANI-UIP foundational model with the generalization capability comparable to semi-empirical GFN2-xTB and DFT with a double-zeta basis set for organic molecules. We show that the AIO-ANI model can learn across different QC levels ranging from semi-empirical to density functional theory to coupled cluster. We also use AIO models to design the foundational model {\Delta}-AIO-ANI based on {\Delta}-learning with increased accuracy and robustness compared to AIO-ANI-UIP. The code and the foundational models are available at https://github.com/dralgroup/aio-ani; they will be integrated into the universal and updatable AI-enhanced QM (UAIQM) library and made available in the MLatom package so that they can be used online at the XACS cloud computing platform (see https://github.com/dralgroup/mlatom for updates).

chem, doi, dral aio model 18, (12 more...)

2409.12015

Country:

Asia > China > Fujian Province > Xiamen (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

arXiv.org Artificial IntelligenceSep-10-2024

Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles

Rydzewski, Jakub

Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral map can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.

chem, phy, spectral gap, (12 more...)

doi: 10.1021/acs.jctc.4c00428

2409.06428

Country:

North America > United States (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Poland > Kuyavian-Pomeranian Province > Toruń (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.94)

Braverman, Vladimir, Dharangutte, Prathamesh, Shah, Vihan, Wang, Chen

Learning-augmented Maximum Independent Set

arXiv.org Artificial IntelligenceJul-16-2024

We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of $n^{1-\delta}$ for any $\delta>0$. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membership queries for a fixed MIS with probability $1/2+\varepsilon$. In the first setting we consider, the oracle can be queried once per vertex to know if a vertex belongs to a fixed MIS, and the oracle returns the correct answer with probability $1/2 + \varepsilon$. Under this setting, we show an algorithm that obtains an $\tilde{O}(\sqrt{\Delta}/\varepsilon)$-approximation in $O(m)$ time where $\Delta$ is the maximum degree of the graph. In the second setting, we allow multiple queries to the oracle for a vertex, each of which is correct with probability $1/2 + \varepsilon$. For this setting, we show an $O(1)$-approximation algorithm using $O(n/\varepsilon^2)$ total queries and $\tilde{O}(m)$ runtime.

algorithm, query, vertex, (14 more...)

2407.11364

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(22 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)